Applying Machine Learning Methods to Forecasting Customer Churn for a Telecommunications Company *

Applying Machine Learning Methods to Forecasting Customer Churn for a Telecommunications Company * Sholom-Aleichem Priamursky State University

70A, Shirokaya street 679015 Birobidzhan Russian Federation

G.I.Nevelskoy Maritime State University named after

50A, Verkhneportovaya street 690059 Vladivostok Russian Federation

G.I.Nevelskoy Maritime State University named after

50A, Verkhneportovaya street 690059 Vladivostok Russian Federation

Sumy State University

2, Rymskogo-Korsakova street 40007 Sumy Ukraine

Irkutsk National Research Technical University

83, Lermontova street 664074 Irkutsk Russian Federation

Applying Machine Learning Methods to Forecasting Customer Churn for a Telecommunications Company * 860E9605AA5F0FE695DECF95B9BC1677 GROBID - A machine learning software for extracting information from scholarly documents Machine Learning Customer Churn Telecommunications Company Area Under The Curve An Ensemble of Techniques

The paper presents a brief overview of existing approaches to predicting customer churn using the example of a telecommunications company. The authors provide research on churn forecasting using 11 different machine learning methods. For training, data containing 20 different information parameters about clients were used. The quality of education was assessed using the traditional Area Under the Curve characteristic. The paper also provides research results confirming that the use of ensembles of machine learning methods increases the quality of predicting customer churn.

Introduction

The number of clients for any company is undoubtedly an important parameter, and the more clients, the higher the company's profit. There is no monopoly on the provision of mobile services to customers, high-speed Internet access or cable television, therefore, the so-called outflow of customers is possible for one reason or another, most often associated with the transition to another telecommunications company offering more favorable conditions from the point of view of customers ... To ensure the stable operation of a telecommunications company, an analysis of the customer base is necessary, for example, to ensure targeted promotions, as well as an analysis of the reasons for customer churn, including with the ability to predict the number of customers who have left and the reasons for switching to the services of another telecommunications company. According to experts, attracting one new client costs companies on average seven times more (in some cases, from 5 to 20 times more) than retaining an existing one [1][2][3]. Reducing customer churn by 5% increases the company's profits from 25% to 85% [4]. Therefore, understanding how exactly to maintain customer engagement is a natural foundation for developing strategies for customer retention. In this regard, there is a number of scientific studies aimed both at predicting customer churn and predicting their preferences in any specific services. For example, in work [5], the authors try to understand the behavior of customers when paying bills, using such machine learning methods as logistic regression, one rule and support vector machines, they proposed an analytical approach to studying and predicting the payment behavior of customers. As input for the analytical system, they use: customer ID (a hashed unique number indicating each customer), action type (such as SMS, IVR, phone calls, service cut related and legal actions), the date of the action's execution, stage changer fl ag (such as payment through the banking system, unpaid invoice occurrence, or action timer fl ag). The feasibility of using machine learning methods is ensured by their ability to detect patterns in data of various natures and universes [6].

In [7], on the basis of Call Data Records data, various types of customers are determined, which subsequently affects the analysis of the possibility of their outflow and the impact on other customers of the company. Authors include these types of customers: follower, standard, leader, core and important customer. The authors note that customers who interact with the Leader and Important categories are more likely to churn after an influential member who also left the company. For clustering, the authors used neural networks such as linear perceptron, multilayer perceptron and networks with radial basis activation function. In the work of the authors [8], using the random forest classifier, decision tree classifier, gradient-boosted tree classifier and multiplexer perceptron, clients were segmented according to the "time-frequencymonetary" approach, while "Time" characterizes the total of calls duration and Internet sessions in a certain period of time, the "Frequency" parameter characterizes the frequency of using services frequently within a certain period, and the "Monetary" parameter is determined by the amount of money spent during a certain period.

It should be noted that there is often a situation where the assessment of the forecast of customer churn is carried out according to completely different criteria. For example, in [9], forecasting is based on the following information: region where the customer lives; time since the customer joined the operator (in months); average revenue; how long did the customer not pay the bills; amount that the customer is overdue; number of times the service was disconnected, etc. In the study [10], the following parameters are used to predict customer churn: state the US state, in which, the customer resides, the remaining seven-digit phone number, total number of calling minutes used during the day, the billed cost of daytime calls, total number of calling minutes used during the nighttime, and a few others. Moreover, some of these criteria may not be available to telcos, making it difficult to determine how, in practice, a telco will forecast these models. It is also worth mentioning several studies [11][12] on the use of customer analysis technologies in the banking sector, for example, to predict the likelihood of outflow based on information related to the sociodemographic parameters of customers, their activity in obtaining banking information, information about their salaries, and etc.

There is no unambiguous option for the machine learning methods used, there are a lot of them, it is almost impossible to choose an empirically suitable method, it is necessary to conduct training, in a practical way, selecting the optimal architecture and parameters of the machine learning method.

This paper presents the results of predicting the churn of customers of a telecommunications company using machine learning methods, since the latter are able to use statistical data, to determine dependencies between data of different nature.

Preparation of training sample

The problem of predicting customer churn can be formulated as a binary classification problem [9]. The solution to the problem is to classify customers with the corresponding characteristics (parameters) x ∈ X to one of the two classes Y = {no churn, churn}.

To solve such a problem, in this work, we used a dataset consisting of 21 columns and 7043 rows made publicly available by IBM and containing information about customers of a telecommunications company (available at https://www.kaggle.com/blastchar/telco-customer-churn). Information about each customer includes the following characteristics, which are identifying signs for predicting churn (they are presented in Tables 1-3), and, of course, each customer is assigned a unique customer identifier (customerID).

Also, data on the period of use of the services of a telecommunications company were used as input information, the parameter is calculated in months (tenure); the size of the client's monthly fee (MonthlyCharges) and the final amount of payments for the entire period of work with the client (TotalCharges). For each set of input information, consisting of 20 indicators, there is a predetermined output -Churncharacterizing whether the client left the telecommunications company with these parameters or not. Also, data on the period of use of the services of a telecommunications company were used as input information, the parameter is calculated in months (tenure); the size of the client's monthly fee (MonthlyCharges) and the final amount of payments for the entire period of work with the client (TotalCharges). For each set of input information, consisting of 20 indicators, there is a predetermined output -Churncharacterizing whether the client left the telecommunications company with these parameters or not.

Modeling machine learning methods

The following machine learning methods have been selected for training, which have proven themselves well in solving various classification problems: AdaBoostadaptive boosting; Decision Tree -decisive trees; Extra Tree Classifier -random trees; Gradient Boosting -gradient boosting; KNeighbors -k-nearest neighbors method; Logistic Regression -logistic regression; Naive Bayes -naive Bayesian classifier; Neural Network -neural networks; Random Forest -random forest method; SVM -support vector machine; XGB -Gradient boosting on trees.

To assess the effectiveness of the models, the Area Under the Curve (AUC) characteristic was chosen, a statistical indicator that is often used in machine learning methods that determines the area bounded by a certain curve and an abscissa axis, called the ROC curve (from receiver operating characteristic). The use of AUC for binary classification problems is popular because of its simplicity, intuitive interpretation [9], and also its use in the case of unbalanced datasets [13]. For a random classifier, the AUC is 0.5, and for an ideal classifier, the AUC is 1 [14]. Since there are only two options for a conclusion in the problem being solved (whether the client leaves or not), it is the ROC analysis, which is a graphical method for assessing the quality of the work of a binary classifier, that is most promising for assessing the proposed method for predicting the churn of customers of a telecommunications company. To construct the ROC curve, a pair of the following values is used: sensitivity and specificity. The value "sensitivity" characterizes the share of truepositive classifications in the total number of positive observations and is marked along the vertical axis of the ROC-curve graph, and the value "specificity" characterizes the proportion of true-negative classifications in the total number of negative observations and is marked along the horizontal axis of the ROC-curve graph. It should be noted that the higher the "sensitivity" value, the more reliably the classifier recognizes positive examples, and the higher the "specificity" value, the more reliable the classifier recognizes negative observations. Thus, the ROC curve reflects the relationship between the probability of false alarms (proportion of falsepositive classifications) and the probability of "correct detection" (proportion of truepositive classifications). With an increase in sensitivity, the reliability of recognition of positive observations increases (the probability of "missing a target" decreases), but at the same time the probability of a false alarm increases. Table 4 shows the learning outcomes of individual machine learning methods and their characteristics according to the AUC metric. 4 shows that three machine learning methods give the best results: LogisticRegression, AdaBoostClassifier, and XGB. For example, logistic regression gives correct results in 80.38% of examples. The authors decided to carry out computer modeling to identify ensembles of methods that optimally solve the problem of predicting the outflow of customers of a telecommunications company. For this, various combinations of the above machine learning methods were considered (Table 5 shows a fragment of the results obtained).

Optimization of method ensembles according to the criterion of the maximum AUC value allowed us to identify the best ensemble of methods containing such machine learning methods as Logistic Regression, Gradient Boosting and XGB, while the number of correct conclusions such an ensemble produces in 81.37% of examples.

Conclusion

Thus, as a result of the work carried out, the authors investigated 11 different machine learning methods to solve the problem of predicting the churn of customers of a telecommunications company. The best machine learning method was LogisticRegression, which showed an AUC of 80.38%. The use of Logistic Regression in conjunction with other machine learning methods within the ensemble of machine learning methods allowed us to increase the forecasting quality to 81.37. Further research by the authors will be devoted to identifying those identification parameters that significantly affect the process of predicting customer churn in order to use a smaller number of identification features with constant values of the forecast quality.

Table 1 .1Description of identification signs that have two answer options.No.Identification signChurn customersNon Churn customers1Gender (gender) of the client (gender)Male -50.7 % Female -49.3 %Male -50.2 % Female-49.8 %2Whether the user is a Senior Citizen0 -87.1 % 1 -12.9 %0 -74.5 % 1 -25.5 %3The client has a partner (Partner)Yes -52.8 % No -47.2 %Yes -35.8 % No -64.2 %4Client has DependentsYes -34.3 % No -65.7 %Yes -17.4 % No -82.6 %5The indicator of the client's phone number (PhoneService)Yes -90.12 % No -9.88 %Yes -90.9 % No -9.1 %6Paperless BillingYes -53.6 % No -46.4 %Yes -74.9 % No -25.1 %

Table 2 .2Description of identification signs that have three answer options.No.Identification signChurn customersNon Churn customers1The indicator of the presence of several communication lines at the client (MultipleLines)Yes -41 % No -49.12 % No phone service -9.88 %Yes -45.5 % No -45.4 % No phone service -9.1 %The type of communicationDSL -37.9 %DSL -24.6 %2line wire that the client isFiber optic -34.8 %Fiber optic -69.4 %using (InternetService)No -27.3 %No -6 %Customer's Internet SecurityYes -33.3 %Yes -15.8 %3Score (OnlineSecurity)No -39.4 %No -78.2 %No internet service -27.3 %No internet service -6 %An indicator of whether theYes -36.8 %Yes -28 %4customer is backing upNo -35.9 %No -66 %(OnlineBackup)No internet service -27.3 %No internet service -6 %Indicator, the presence ofYes -36.3 %Yes -29.2 %5deviceprotectionNo -36.4 %No -64.8 %(DeviceProtection)No internet service -27.3 %No internet service -6 %Indicator whether the userYes -33.5 %Yes -16.6 %6has technical protectionNo -39.2 %No -77.4 %(TechSupport)No internet service -27.3 %No internet service -6 %The indicator of whether theYes -36.6 %Yes -43.6 %7client has live televisionNo -36.2 %No -50.4 %broadcasts (StreamingTV)No internet service -27.3 %No internet service -6 %The indicator of whether theYes -37.1 %Yes -43.8 %8client has streaming moviesNo -35.6 %No -50.2 %(StreamingMovies)No internet service -27.3 %No internet service -6 %Type of the concludedMonth-to-month -43 %Month-to-month -88.6 %9paymentagreementOne year -25.3 %One year -8.88 %(Contract)Two year -31.7 %Two year -2.52 %

Table 3 .3Description of identification signs that have four answer options.No.sign IdentificationChurn customersNon Churn customersClient paymentMailed check -25.1 %Mailed check -16.5 %1type (PaymentMethod)Electronic check -25.1 % Credit card (automatic) -25 %Electronic check -57.3 % Credit card (automatic) -12.4 %Bank transfer (automatic) -24.8%Bank transfer (automatic) -13.8%

Table 4 .4Qualitative assessment of predicting customer churn using machine learning methods.No.Machine learning methodAUC value1LogisticRegression0.80382AdaBoostClassifier0.80243XGB0.80244GradientBoostingClassifier0.79625RandomForest0.78156ExtraTreesClassifier0.77257SVM0.76878KNeighbours0.76599Neural Network0.76310 Naive Bayes0.760211 DecisionTree0.7408Table

Table 5 .5Results of computer modeling of ensembles of machine learning methods.No.1 ensemble machine2 ensemble machine3 ensemble machineAUC valuelearning methodlearning methodlearning method(%)1Logistic RegressionGradient Boostinggradient boosting on trees (XGB)81.372gradient boosting on trees (XGB)adaptive (AdaBoost)boostingGradient Boosting80.713k-nearest method (KNeighbors) neighborsgradient boosting on trees (XGB)neural networks80.62Logistic RegressionGradient Boostingk-nearest neighbors4method80.23(KNeighbors)5Logistic RegressionGradient Boostingsupport machine (SVM) vector80.26Logistic RegressionGradient Boostingrandom forest807random forestsupport machine (SVM) vectorneural networks79.98k-nearest method (KNeighbors) neighborsLogistic Regression naive classifierbayes79.439k-nearest method (KNeighbors) neighborsnaive classifierbayesneural networks79.4310k-nearest method (KNeighbors) neighborsnaive classifierbayesgradient boosting on trees (XGB)79.34

Building comprehensible customer churn prediction models with advanced rule induction techniques WVerbeke Expert systems with applications 38 3 2011 Telecommunication subscribers' churn prediction model using machine learning SAQureshi Eighth International Conference on Digital Information Management

ICDIM; Islamabad

IEEE 2013. 2013 The architecture of a churn prediction system based on stream mining BBalle Artificial Intelligence Research and Development 256 2013 Churn Prediction in Telecommunication using Logistic Regression and Logit Boost HJain AKhunteta SSrivastava Procedia Computer Science 167 2020 Using Behavioral Analytics to Predict Customer Invoice Payment MBahrami BBozkaya SBalcisoy Big Data 8 1 2020 Comparative models in customer base analysis: parametric model and observation-driven model S.-MXie Journal of Business Economics and Management 21 6 2020 Social Network Analysis and Churn Prediction in Telecommunications Using Graph Theory SMKostić MISimić MVKostić Entropy 22 7 753 2020 Predictive analytics using big data for increased customer loyalty: Syriatel Telecom Company case study WNWassouf Journal of Big Data 7 29 2020 Profit Driven Decision Trees for Churn Prediction BBaesens European journal of operational research 284 3 2020 A Hybrid Swarm Intelligent Neural Network Model for Customer Churn Prediction and Identifying the Influencing Factors HFaris Information 9 11 288 2018 Propensity to Churn in Banking: What Makes Customers Close the Relationship with a Bank? CAlexandru Economic Computation & Economic Cybernetics Studies & Research 54 2 2020 Forecasting client retention -a machine-learning approach SESchaeffer SV RSánchez Journal of Retailing and Consumer Services 52 101918 2020 Customer churn prediction in telecom using machine learning in big data platform AKAhmad AJafar KAljoumaa Journal of Big Data 6 1 28 2019 Benchmarking sampling techniques for imbalance learning in churn prediction BZhu Journal of the Operational Research Society 69 1 2018