=Paper= {{Paper |id=Vol-3611/paper15 |storemode=property |title=Telecommunication customer churn prediction using machine learning methods |pdfUrl=https://ceur-ws.org/Vol-3611/paper15.pdf |volume=Vol-3611 |authors=Monika Zdanavičiūtė,Rūta Juozaitienė,Tomas Krilavičius |dblpUrl=https://dblp.org/rec/conf/ivus/ZdanaviciuteJK22 }} ==Telecommunication customer churn prediction using machine learning methods== https://ceur-ws.org/Vol-3611/paper15.pdf
                                Telecommunication customer churn prediction using machine
                                learning methods
                                Monika Zdanavičiūtė1,2, Rūta Juozaitienė1,2,3 and Tomas Krilavičius1,2
                                1 Vytautas Magnus University, Faculty of Informatics, Vileikos street 8, LT-44404 Kaunas, Lithuania
                                2 Centre for Applied Research and Development, Lithuania
                                3 Vilnius University, Vilnius, Lithuania



                                              Abstract
                                              These days telecommunication sector has grown significantly due to the use of smart technologies, and it
                                              is likely to continue to grow. The main resource of telecommunications companies is customers, but due to
                                              the relatively high level of competition in this field, most customers are not tied to a single service
                                              company. To understand the key factors contributing to customer churn rate, we have analysed the real
                                              data of one telecommunication company. The data from 2020-01-01 to 2022-03-07 consisted of information on
                                              21128 users, 140970 payments and 350379 calls. The main contribution of our work was to develop a
                                              churn prediction model which identifies customers who are most likely subject to churn. We performed
                                              experiments using k-nearest neighbours, support vector machine, decision trees, random forest, naive Bayes
                                              classifiers and the Cox proportional hazard model with time-varying covariates. Results showed that the
                                              Cox regression model with time-varying covariates was superior to classical classification methods because it
                                              can take into account static user parameters and reflect their changes over time.

                                              Keywords
                                              Churn prediction, telecommunication churn, survival analysis, churn, telecommunications


                                1. Introduction                                                                                    a common business problem in this area. To re-
                                                                                                                                   tain customers, companies try to predict which con-
                                These days telecommunication sector has grown                                                      sumers are going to leave in a variety of ways.
                                significantly due to the use of smart technologies,                                                   A genetic algorithm has been proposed to iden-
                                and it is likely to continue to grow. The main                                                     tify customers who intend to change their telecom-
                                resource of telecommunications companies is cus-                                                   munications company in the near future [2]. The
                                tomers, but due to the relatively high level of com-                                               database used in the study consisted of 5250 cus-
                                petition in this field, most customers are not tied                                                tomers call data. Each user’s profile consisted of
                                to a single service company. Therefore, in order to                                                information about his behavior and habits (aver-
                                create a successful business in this field, it is neces-                                           age monthly costs for local and international calls,
                                sary to know your client, his needs and opportuni-                                                 average amount of internet data, average monthly
                                ties. Data collected by telecommunication compa-                                                   call time, amount of roaming and special services
                                nies on a daily basis can be very helpful in gaining a                                             used). In the developed model, the genetic algo-
                                proper understanding of customer behavior. Anal-                                                   rithm, by iteratively adjusting the coordinates of
                                ysis of such data is needed to understand what fac-                                                each profile in the plane, creates certain groups of
                                tors may be associated with a customer leaving the                                                 similar elements. The efficiency of the model was
                                customer base. The main goal of this research is to                                                evaluated by observing the change in the error func-
                                develop a churn prediction model which identifies                                                  tion in each iteration. The algorithm grouped cus-
                                customers who most likely subject to churn.                                                        tomers into four clusters - 1% of very high-spending
                                                                                                                                   customers, 9% of high and medium-spending cus-
                                2. Literature review                                                                               tomers, 12% of medium-spending customers, and
                                                                                                                                   78% of low-spending customers.
                                The term customers churn can be described as the                                                      The study [3] was conducted by analyzing data
                                loss of customers to a company [1]. Due to the                                                     from 7043 telecommunication customers, 1869 of
                                specifics of telecommunications companies, this is                                                 whom had already left the customer base. Each
                                                                                                                                   customer is described by 21 variables, one of which
                                IVUS 2022: 27th International Conference on Information                                            is a binary, which indicates whether the user has al-
                                Technology, May 12, 2022, Kaunas, Lithuania                                                        ready left the company. Using XGBoost (Extreme
                                $ monika.zdanaviciute@vdu.lt (M. Zdanavičiūtė);                                                    Gradient Boosting Tree), k-Nearest Neighbors and
                                ruta.juozaitiene@vdu.lt (R. Juozaitienė);                                                          Random Forest methods, customers are classified
                                tomas.krilavicius@vdu.lt (T. Krilavičius)
                                            © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons
                                            License Attribution 4.0 International (CC BY 4.0).
                                            CEUR Workshop Proceedings (CEUR-WS.org)




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
into two groups according to this variable. Accu-       arates customers into two groups.
racy and F-score measures were used to assess the          The study [6] uses telecommunication users data
accuracy of the models, which showed that the XG-       for customer analysis, which stores basic user infor-
Boost method was the best classifier. In addition,      mation (age, gender, etc.), plan order information
this method was used to find out which variables        (payment method, monthly fee, full-time fee, etc.).
most influence customer exit. This study found          It also provides information about the services
that customers with higher monthly charges are          (telephones, internet, television, insurance, etc.)
more likely to churn.                                   and information on whether the customer is ac-
   Data from the SyriaTel telecommunications ser-       tive or has already churned. Clustering (k-Means,
vice provider were used for the [4] study. The          DBSCAN) and classification methods (Multi-Layer
analyzed period is 9 months (about 10 million           Perceptron, Back Propagation algorithm, Decision
users), and the available information includes data     Trees, Logistic Regression, Support Vector Ma-
about client (age, gender, place of residence, type     chine) were used to analyze this data. The classifi-
of contract concluded, services received), his ac-      cation models were evaluated with several measures
tions (calls, messages, and internet usage), mobile     of Precision and Accuracy, and the Back Propa-
device (device type, brand, model), and telecom-        gation algorithm and Multi-Layer Perceptron best
munications tower infrastructure. To better de-         predicted the customer’s retraction. When cluster-
scribe users, the available data was used to create     ing analysis was applied to groups, better active
a social network for all customers and to calculate     and inactive clients were separated using the DB-
variables such as degree centrality measures, simi-     SCAN algorithm.
larity values, and customer’s network connectivity         The customer loyalty task is usually formulated
for each customer. For model training and testing,      as a classification task, the data set of which con-
data were separated into training (70%) and testing     sists of active and churned users. To solve this
(30%) sets. Because the data sets were unbalanced       problem, the literature suggests the use of 𝑘–
(there were significantly more outgoing customers       Nearest Neighbors [3], Neural Networks [5][6], Sup-
than active ones), the classification was done in       port Vectors [6][5], and Bayesian classifiers [5]. De-
two ways: by balancing the sets and applying the        cision Tree, Random Forest, and XGBoost algo-
data as it is. Using Decision Trees, Random For-        rithms can also be used to analyze customer loyalty
est, GBM (Gradient Boosted Machine Tree), and           [3][4]. However, there are also cases when this prob-
XGBoost algorithms, customers were classified into      lem is solved by applying clustering methods, e.g.
two classes: churned and existing customers. The        Genetic [2] or 𝑘–means algorithm [6]. This type of
AUC (Area under curve) was used to determine ac-        task is usually based on user information, as well
curacy. The obtained results showed that the XG-        as payment history and calls data. Research shows
Boost algorithm classifies customers best according     that customers who pay more for services tend to
to the available data.                                  change telecommunications operators.
   The data set for the study [5] consists of call
                                                        3. Methods
records obtained from the University of California,
Department of Information and Computer Science.
The data set provides information on the use of the
                                                            1. k-Nearest Neighbors is an algorithm that
3333 customer mobile system, which consists of 15
                                                               stores all available cases and classifies new
quantitative, 5 categorical variables and a binary
                                                               cases based on a similarity measure (dis-
variable, describing whether the customer has left
                                                               tance functions). Euclidian distance func-
the customer base of the telecommunications ser-
                                                               tion [7]:
vice provider. In the analysis of the available call                          ⎯
                                                                              ⎸ 𝑘
data, each user is assigned variables describing his                          ⎸∑︁
                                                                              ⎷ (𝑥𝑖 − 𝑦𝑖 )2
call habits and, using classification methods, these                                                      (1)
                                                                                 𝑖=1
customers are divided into two classes according
to said binary variable. The research uses Neural          2. Support Vector Machine (SVM) performs
Networks, Support Vector Machine and Bayesian                 classification by finding the hyperplane that
classification methods. The data set is divided into          maximizes the margin between the two
training (80% of all data) and testing (20% of all            classes [7]. Hyperplane equation:
data) sets so that the training set is balanced. Then
95% of the customers who leave and 5% of the ex-                               𝑤𝑇 𝑥 + 𝑏 = 0               (2)
isting ones remain in the testing set. The study
revealed that the support vectors method best sep-             To define an optimal hyperplane we need to
   maximize the width of the margin (𝑤):                           7. Confusion matrix was used to assess the ac-
                                                                      curacy of user classification. Elements of the
                                   2                                  confusion matrix:
                           max                           (3)
                                  ‖𝑤‖
                                                                          • TP (true positive) - The user is ex-
3. Decision Tree is a flowchart-like structure in                           pected not to churn and he remains.
   which each internal node represents a "test"                           • TN (true negative) - The user is ex-
   on an attribute, each branch represents the                              pected to churn and he churns.
   outcome of the test, and each leaf node rep-                           • FP(false positive) - The user is ex-
   resents a class label [8]. A quantitative mea-                           pected to remain but he churns.
   sure of randomness, entropy, is used to select                         • FN (false negative) - The user is ex-
   a feature in a node. The initial entropy of                              pected to churn but remains.
   the set 𝐸:

                                                                4. Data set
                 ∑︁
     𝐻(𝐸) = −         𝑃 (𝑘𝑖 |𝐸) log2 𝑃 (𝑘𝑖 |𝐸), (4)
                   𝑘𝑖 ∈𝐾
                                                                The analyzed data consists of three data sets in the
   where                                                        range from 2020-01-01 to 2022-03-07:
                           |𝑒 : 𝑒 ∈ 𝐸, 𝑒 ∈ 𝑘𝑖 |                    1. Users data set. Individual users informa-
            𝑃 (𝑘𝑖 |𝐸) =                         ,        (5)
                                   |𝐸|                                tion, which includes demographic and other
                                                                      data provided during registration. This
   mean 𝐸1 , . . . , 𝐸𝑛 entropy after division:                       study analyzes 21128 users.
                            𝑛
                           ∑︁                                      2. Payments data set. 140970 payment records
            𝐵(𝐸, 𝑝) =            𝑃 (𝑣𝑗 |𝐸)𝐻(𝐸𝑗 ),        (6)          showing when and what type of plan was
                           𝑗=1                                        purchased and how much it cost. There are
                                                                      two types of plans: monthly and yearly.
   where                                                           3. CDR (call detail record) data set. A real-
                               𝐸𝑗
                      𝑃 (𝑣𝑗 |𝐸) = .           (7)                     time data records documenting telephone
                                𝐸
                                                                      calls or other telecommunications operations
4. Random Forest is an ensemble learning
                                                                      (3350379 records).
   method for classification tasks that operates
   by constructing a multitude of decision trees                  After the data transformations, a list of variables
   at training time. The output of the random                   describing the users was created (Table 1).
   forest is the class selected by most trees [9].
5. Naı̈ve Bayes classifier assume that the ef-                  Table 1
   fect of the value of a predictor (𝑥) on a                    Created user-defining variables
   given class (𝑐) is independent of the values                      Created variables used for churn prediction
   of other predictors [7]. This assumption is                       Total amount of seconds
   called class conditional independence.                            Total amount of calls
                                                                     Number of failed calls
                             𝑃 (𝑋|𝐶)𝑃 (𝑐)                            Ratio of failed calls to total calls
                𝑃 (𝑐|𝑥) =                 .              (8)
                                 𝑃 (𝑥)                               The amount of not failed calls
                                                                     Total amount of active days
6. Cox proportional hazard model with time
                                                                     Mean call duration
   varying covariates is method for investigat-                      Max call duration
   ing the effect of several variables upon the                      Median between calls
   time a specified event takes to happen. In a                      Median between active days
   Cox proportional hazards regression model,                        Number of contacts called
   the measure of effect is the hazard rate.                         Total amount of purchased plans
   Hazard function for individual 𝑖:                                 Last plan before (amount of days)
                                                                     Total amount paid
   ℎ𝑖 (𝑡) = ℎ0 exp(𝛽1 𝑥𝑖1 + 𝛽2 𝑥𝑖2 + . . . + 𝛽𝑛 𝑥𝑖𝑛 )
                                                          (9)
   where ℎ0 (𝑡) is the baseline hazard function,
   𝑥𝑖1 , 𝑥𝑖2 . . . . , 𝑥𝑖𝑛 – covariates, 𝛽1 , 𝛽2 , . . . , 𝛽𝑛
   – regression coefficients.
5. Churn definition
In order to assess the risk of customer’s churn, the
definition of churn must first be de fined. Since user
leaving the customer base can be described in sev-
eral ways, it is necessary to monitor client behavior
and changes in activity and decide which definition
best describes churn. In the study, user churn is
described in two different w ays. Different problem-
solving methods are used for each of these two op-
tions.
   1. The user is classified as a churned customer
      if he has not purchased a new plan 35 days
      after the first plan purchase.                        Figure 2: Frequency of number of days between
      Figure 1 shows a bar graph showing the dis-           plans
      tribution of the number of plans purchased
      by customers. It shows that most customers
      have only bought one plan.                            plans, so it was decided to define churn in
                                                            another, more universal way.
                                                         2. The user is classified as a churned customer
                                                            if he does not use the services provided by
                                                            the company for 25 consecutive days (does
                                                            not call anyone).
                                                            To find the optimal interval of days, af-
                                                            ter which we could treat the user as leav-
                                                            ing, rather than just taking a break between
                                                            calls, a percentage of users returned to the
                                                            system after 𝑥 days of inactivity is calcu-
                                                            lated. In the graph shown in Figure 3, the
                                                            abscissa axis reflects the number of inactive
                                                            days 𝑥, and the ordinate axis corresponds to
                                                            the number of users (in percent). The blue
       Figure 1: Frequency of number of plans               bar then shows the percentage of users who
       pur-chased by the user                               had an 𝑥-day interval between calls, and the
                                                            red bar represents the percentage of users
                                                            who returned to the system after 𝑥 days (call
       The distribution of intervals between plan           again). It can be seen from this graph that
       orders for users who have purchased more             almost all users have had a one-day inter-
       than one plan is shown in Figure 2. It shows         val (𝑥 = 1) between calls and only about
       that most plans are ordered every 30 days, in        60% of them have returned to the system
       other words, most plans are ordered on a             after this interval. Nearly 80% of users have
       regular monthly basis. There are also some           had a thirty-day interval (𝑥 = 30) between
       users who order multiple plans on the same           calls, with less than 25% returning to the
       day. The data set for the classification mod-        system. There is no clear break in the num-
       els consists of variables describing user be-        ber of users who have not returned to the
       havior (Table 1), calculated on the 25th day         system, but there is a steady decrease in the
       after the purchase of the first plan. Class la-      number of users who have returned to the
       bels indicate whether the customer has pur-          system. It has been decided that 25 days is
       chased a second plan within 35 days after            a sufficient period of inactivity to consider a
       the first plan. Five different methods are           user leaving the system.
       used for classification: k-Nearest Neighbors,        The user is monitored from the first day of
       Sup-port Vectors Machine, Decision Tree,             registration until churn (25 inactive days in
       Ran-dom Forest and Naı̈ve Bayes classifier.          a row). In this case, it is not the static
       This definition of churn can only be used to         variables that are observed, but their change
       predict consumers purchasing monthly
Figure 3: Frequency of users returning to the system after x days of inactivity



       over time.                                       25th day after the purchase of the first p lan. An
       There are times when after a long break (af-     attempt is then made to assign the user to one of
       ter so called churn) the user returns to the     the classes (predicted to remain in the system or
       system and starts using the services again.      leave). The characteristics describing user activity
       For such cases, the algorithm is designed so     are presented in Table 1.
       that withdrawn customer is still monitored,         Customers are divided into model training (70%
       and when he returns to the system (calls         data) and testing (30% data) sets. Five differ-
       again), he is treated as a newly logged-in       ent methods are used for classification: k-Nearest
       user.                                            Neighbors, Support Vectors Machine, Decision
                                                        Tree, Random Forest and Naı̈ve Bayes classifier.
                                                           The values of the confusion matrix elements
6. Experiments                                          evaluating the accuracy of the listed classification
                                                        methods and the percentage accuracy of all models
6.1. Evaluating the purchase of a plan                  is given in Table 2. It can be seen that the SVM
Users classification is performed by dividing cus-      model achieves the best accuracy.
tomers into these two groups:
                                                        Table 2
   • An active user is one who re-orders a plan         Evaluation of classification models accuracy
     within 35 days after the first order of the
                                                          Method                     TP     TN     FP    FN     Accuracy
     plan.
                                                          k-Nearest Neighbors        634    3747   650   1308   69.11 %
   • Withdrawn - did not order another plan
                                                          Support Vector Machine     631    4094   303   1311   74.54 %
     within 35 days after ordering the first plan.        Decision Tree              688    4028   369   1254   74.40 %
  The selected forecast period is 10 days. User           Random Forest              780    3937   460   1162   74.41 %
data is tracked for 25 days from the first plan pur-      Naı̈ve Bayes classifier    629    3931   466   1313   71.94 %
chase. Based on these data, the characteristics de-
scribing the user’s behavior are calculated on the
6.2. Estimating inactive time intervals                  curacy and are more suitable for predicting churn
     between calls                                       than M2 model.

5000 users were randomly selected from the full list     Table 4
of users. For each of them, the variables in Table       Evaluation of Cox models accuracy
1 are calculated on every day from the time the
                                                            Model    TP      TN    FP     FN     Accuracy
user registers until he leaves or until the end of the
                                                            M1       1059    529   236    917    57.94 %
entire data range. This results a data set in which
                                                            M2       991     539   226    985    55.82 %
each user is described not by one row, but by as            M3       1059    529   236    917    57.94 %
many rows as number of days the user has been in
the system for. Having this data set it is possible
to track how user activity has changed over time.
Any user who leaves the system for more than 25          7. Conclusion
days (does not call anyone for 25 days) and then
returns to it (calls again) is treated as a new user     Experiments with the telecommunication customer
(assigned a new identification n umber). As a result,    data set show that:
the creation of such a user data set increases the
number of users to a total of 15435. A training set         1. After assessing the specifics of the available
(70 % of these users) is used to create the model.             data, it was decided to define user activity
   To select the most appropriate Cox regression               in two ways: according to the plans to be
model, three different c ombinations o f variables             purchased and according to the frequency of
were created and three models were constructed.                calls made.
Table 3 shows the variables for all three models. In        2. In the case where the customer is consid-
each case, only non-correlated, statistically signifi-         ered active as long as he regularly pur-
cant variables are included in the model.                      chases the call plans offered by the sup-
                                                               plier, the following classification algorithms
Table 3                                                        were used to segment the users: k-Nearest
Variables describing user behavior in Cox models               Neighbors, Support Vector Machine, Deci-
                                                               sion Tree, Random Forest, and Naı̈ve Bayes
  Variable                         M1     M2    M3
  active days count                x      x     x
                                                               classifier. Based on experimental studies,
  mean call duration               x      x     x              it can be stated that the classification of
  max call duration                x            x              Support Vector Machine outperforms other
  number of contacts               x       x    x              methods.
  nul call ratio                   x       x    x           3. In the case where user activity is defined by a
  median days between calls        x       x    x              faster-changing indicator – the frequency of
  max days between calls           x       x                   calls, it was decided to use the Cox regresion
  median days between active d     x       x       x           model with time varying covariates to divide
  total paid                       x       x       x           users into groups. This model is superior
  last plan before                 x       x       x           to classical classification methods in that it
                                                               can take into account not only static user
   The accuracy of these three models was assessed             parameters but also their change over time.
using a test set (30% of users). Four dates in the an-   In further research the possibility of combining
alyzed period were selected for model testing. Ac-       these two methods to predict the likelihood of cus-
tive users are selected on a specific date and it is     tomer churn may be considered.
predicted that after 10 they will still be active or
churned. The same is repeated with four differ-
ent dates. In this way, the real performance of the      References
model is verified, when both short-term customers
and long-term customers are evaluated. Some users        [1] B. Huang, M. T. Kechadi, B. Buckley, Cus-
may have been analyzed several times at different            tomer churn prediction in telecommunications,
times. In total, the model evaluated customers               Expert Systems with Applications 39 (2012)
2741 times, of which 1976 users did not quit and             1414–1425.
765 when users left the system. The accuracy of          [2] H. REN, Y. ZHENG, Y. rong WU, Clustering
the models is presented in Table 4. It can be seen           analysis of telecommunication customers, The
that M1 and M3 models have achieved equal ac-                Journal of China Universities of Posts and
    Telecommunications 16 (2009) 114 – 128.
    URL: http://www.sciencedirect.com/science/
    article/pii/S1005888508602149.           doi:https:
    //doi.org/10.1016/S1005-8885(08)60214-9.
[3] J. Pamina, B. Raja, S. SathyaBama, M. Sruthi,
    A. VJ, et al., An effective classifier for predict-
    ing churn in telecommunication, Jour of Adv
    Research in Dynamical & Control Systems 11
    (2019).
[4] A. K. Ahmad, A. Jafar, K. Aljoumaa, Cus-
    tomer churn prediction in telecom using ma-
    chine learning in big data platform, Journal of
    Big Data 6 (2019) 28.
[5] I. Brânduşoiu, G. Toderean, H. Beleiu, Meth-
    ods for churn prediction in the pre-paid mo-
    bile telecommunications industry, in: 2016
    International Conference on Communications
    (COMM), 2016, pp. 97–100. doi:10.1109/
    ICComm.2016.7528311.
[6] I. M. Mitkees, S. M. Badr, A. I. B. ElSeddawy,
    Customer churn prediction model using data
    mining techniques, in: 2017 13th International
    Computer Engineering Conference (ICENCO),
    IEEE, 2017, pp. 262–268.
[7] J. Han, J. Pei, M. Kamber, Data mining: con-
    cepts and techniques, Elsevier, 2011.
[8] G. Norkevičius, G. Raškinis, Lietuvių kalbos
    garsų trukmės modeliavimas klasifikavimo ir
    regresijos medžiais, naudojant didelės apimties
    garsyną, Informacinės technologijos 2007: kon-
    ferencijos pranešimų medžiaga, Kauno tech-
    nologijos universitetas, 2007 m. sausio 31 d.-
    vasario 1 d. Kaunas: Technologija, 2007 (2007).
[9] G. Biau, E. Scornet, A random forest guided
    tour, Test 25 (2016) 197–227.